Goto

Collaborating Authors

 cyber security


The true extent of cyber attacks on UK business - and the weak spots that allow them to happen

BBC News

The first day of September should have marked the beginning of one of the busiest periods of the year for Jaguar Land Rover. It was a Monday, and the release of new 75 series number plates was expected to produce a surge in demand from eager car buyers. At factories in Solihull and Halewood, as well as at its engine plant in Wolverhampton, staff were expecting to be working flat out. Instead, when the early shift arrived, they were sent home. The production lines have remained idle ever since.


CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data

ElZemity, Adel, Arief, Budi, Li, Shujun

arXiv.org Artificial Intelligence

The integration of large language models (LLMs) into cyber security applications presents significant opportunities, such as enhancing threat analysis and malware detection, but can also introduce critical risks and safety concerns, including personal data leakage and automated generation of new malware. To address these challenges, we developed CyberLLMInstruct, a dataset of 54,928 instruction-response pairs spanning cyber security tasks such as malware analysis, phishing simulations, and zero-day vulnerabilities. The dataset was constructed through a multi-stage process. This involved sourcing data from multiple resources, filtering and structuring it into instruction-response pairs, and aligning it with real-world scenarios to enhance its applicability. Seven open-source LLMs were chosen to test the usefulness of CyberLLMInstruct: Phi 3 Mini 3.8B, Mistral 7B, Qwen 2.5 7B, Llama 3 8B, Llama 3.1 8B, Gemma 2 9B, and Llama 2 70B. In our primary example, we rigorously assess the safety of fine-tuned models using the OWASP top 10 framework, finding that fine-tuning reduces safety resilience across all tested LLMs and every adversarial attack (e.g., the security score of Llama 3.1 8B against prompt injection drops from 0.95 to 0.15). In our second example, we show that these same fine-tuned models can also achieve up to 92.50 percent accuracy on the CyberMetric benchmark. These findings highlight a trade-off between performance and safety, showing the importance of adversarial testing and further research into fine-tuning methodologies that can mitigate safety risks while still improving performance across diverse datasets and domains. All scripts required to reproduce the dataset, along with examples and relevant resources for replicating our results, will be made available upon the paper's acceptance.


Outlier-Oriented Poisoning Attack: A Grey-box Approach to Disturb Decision Boundaries by Perturbing Outliers in Multiclass Learning

Paracha, Anum, Arshad, Junaid, Farah, Mohamed Ben, Ismail, Khalid

arXiv.org Artificial Intelligence

Poisoning attacks are a primary threat to machine learning models, aiming to compromise their performance and reliability by manipulating training datasets. This paper introduces a novel attack - Outlier-Oriented Poisoning (OOP) attack, which manipulates labels of most distanced samples from the decision boundaries. The paper also investigates the adverse impact of such attacks on different machine learning algorithms within a multiclass classification scenario, analyzing their variance and correlation between different poisoning levels and performance degradation. To ascertain the severity of the OOP attack for different degrees (5% - 25%) of poisoning, we analyzed variance, accuracy, precision, recall, f1-score, and false positive rate for chosen ML models.Benchmarking our OOP attack, we have analyzed key characteristics of multiclass machine learning algorithms and their sensitivity to poisoning attacks. Our experimentation used three publicly available datasets: IRIS, MNIST, and ISIC. Our analysis shows that KNN and GNB are the most affected algorithms with a decrease in accuracy of 22.81% and 56.07% while increasing false positive rate to 17.14% and 40.45% for IRIS dataset with 15% poisoning. Further, Decision Trees and Random Forest are the most resilient algorithms with the least accuracy disruption of 12.28% and 17.52% with 15% poisoning of the IRIS dataset. We have also analyzed the correlation between number of dataset classes and the performance degradation of models. Our analysis highlighted that number of classes are inversely proportional to the performance degradation, specifically the decrease in accuracy of the models, which is normalized with increasing number of classes. Further, our analysis identified that imbalanced dataset distribution can aggravate the impact of poisoning for machine learning models


Contextualized AI for Cyber Defense: An Automated Survey using LLMs

Haryanto, Christoforus Yoga, Elvira, Anne Maria, Nguyen, Trung Duc, Vu, Minh Hieu, Hartanto, Yoshiano, Lomempow, Emily, Arakala, Arathi

arXiv.org Artificial Intelligence

This paper surveys the potential of contextualized AI in enhancing cyber defense capabilities, revealing significant research growth from 2015 to 2024. We identify a focus on robustness, reliability, and integration methods, while noting gaps in organizational trust and governance frameworks. Our study employs two LLM-assisted literature survey methodologies: (A) ChatGPT 4 for exploration, and (B) Gemma 2:9b for filtering with Claude 3.5 Sonnet for full-text analysis. We discuss the effectiveness and challenges of using LLMs in academic research, providing insights for future researchers.


Generative AI and Large Language Models for Cyber Security: All Insights You Need

Ferrag, Mohamed Amine, Alwahedi, Fatima, Battah, Ammar, Cherif, Bilel, Mechri, Abdechakour, Tihanyi, Norbert

arXiv.org Artificial Intelligence

This paper provides a comprehensive review of the future of cybersecurity through Generative AI and Large Language Models (LLMs). We explore LLM applications across various domains, including hardware design security, intrusion detection, software engineering, design verification, cyber threat intelligence, malware detection, and phishing detection. We present an overview of LLM evolution and its current state, focusing on advancements in models such as GPT-4, GPT-3.5, Mixtral-8x7B, BERT, Falcon2, and LLaMA. Our analysis extends to LLM vulnerabilities, such as prompt injection, insecure output handling, data poisoning, DDoS attacks, and adversarial instructions. We delve into mitigation strategies to protect these models, providing a comprehensive look at potential attack scenarios and prevention techniques. Furthermore, we evaluate the performance of 42 LLM models in cybersecurity knowledge and hardware security, highlighting their strengths and weaknesses. We thoroughly evaluate cybersecurity datasets for LLM training and testing, covering the lifecycle from data creation to usage and identifying gaps for future research. In addition, we review new strategies for leveraging LLMs, including techniques like Half-Quadratic Quantization (HQQ), Reinforcement Learning with Human Feedback (RLHF), Direct Preference Optimization (DPO), Quantized Low-Rank Adapters (QLoRA), and Retrieval-Augmented Generation (RAG). These insights aim to enhance real-time cybersecurity defenses and improve the sophistication of LLM applications in threat detection and response. Our paper provides a foundational understanding and strategic direction for integrating LLMs into future cybersecurity frameworks, emphasizing innovation and robust model deployment to safeguard against evolving cyber threats.


Large Language Models for Cyber Security: A Systematic Literature Review

Xu, HanXiang, Wang, ShenAo, Li, NingKe, Wang, KaiLong, Zhao, YanJie, Chen, Kai, Yu, Ting, Liu, Yang, Wang, HaoYu

arXiv.org Artificial Intelligence

The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.


How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

Banerjee, Somnath, Layek, Sayan, Hazra, Rima, Mukherjee, Animesh

arXiv.org Artificial Intelligence

In this study, we tackle a growing concern around the safety and ethical use of large language models (LLMs). Despite their potential, these models can be tricked into producing harmful or unethical content through various sophisticated methods, including 'jailbreaking' techniques and targeted manipulation. Our work zeroes in on a specific issue: to what extent LLMs can be led astray by asking them to generate responses that are instruction-centric such as a pseudocode, a program or a software snippet as opposed to vanilla text. To investigate this question, we introduce TechHazardQA, a dataset containing complex queries which should be answered in both text and instruction-centric formats (e.g., pseudocodes), aimed at identifying triggers for unethical responses. We query a series of LLMs -- Llama-2-13b, Llama-2-7b, Mistral-V2 and Mistral 8X7B -- and ask them to generate both text and instruction-centric responses. For evaluation we report the harmfulness score metric as well as judgements from GPT-4 and humans. Overall, we observe that asking LLMs to produce instruction-centric responses enhances the unethical response generation by ~2-38% across the models. As an additional objective, we investigate the impact of model editing using the ROME technique, which further increases the propensity for generating undesirable content. In particular, asking edited LLMs to generate instruction-centric responses further increases the unethical response generation by ~3-16% across the different models.


Software Repositories and Machine Learning Research in Cyber Security

Vanamala, Mounika, Bryant, Keith, Caravella, Alex

arXiv.org Artificial Intelligence

In today's rapidly evolving technological landscape and advanced software development, the rise in cyber security attacks has become a pressing concern. The integration of robust cyber security defenses has become essential across all phases of software development. It holds particular significance in identifying critical cyber security vulnerabilities at the initial stages of the software development life cycle, notably during the requirement phase. Through the utilization of cyber security repositories like The Common Attack Pattern Enumeration and Classification (CAPEC) from MITRE and the Common Vulnerabilities and Exposures (CVE) databases, attempts have been made to leverage topic modeling and machine learning for the detection of these early-stage vulnerabilities in the software requirements process. Past research themes have returned successful outcomes in attempting to automate vulnerability identification for software developers, employing a mixture of unsupervised machine learning methodologies such as LDA and topic modeling. Looking ahead, in our pursuit to improve automation and establish connections between software requirements and vulnerabilities, our strategy entails adopting a variety of supervised machine learning techniques. This array encompasses Support Vector Machines (SVM), Na\"ive Bayes, random forest, neural networking and eventually transitioning into deep learning for our investigation. In the face of the escalating complexity of cyber security, the question of whether machine learning can enhance the identification of vulnerabilities in diverse software development scenarios is a paramount consideration, offering crucial assistance to software developers in developing secure software.


Adversarial Deep Reinforcement Learning for Cyber Security in Software Defined Networks

Borchjes, Luke, Nyirenda, Clement, Leenen, Louise

arXiv.org Artificial Intelligence

This paper focuses on the impact of leveraging autonomous offensive approaches in Deep Reinforcement Learning (DRL) to train more robust agents by exploring the impact of applying adversarial learning to DRL for autonomous security in Software Defined Networks (SDN). Two algorithms, Double Deep Q-Networks (DDQN) and Neural Episodic Control to Deep Q-Network (NEC2DQN or N2D), are compared. NEC2DQN was proposed in 2018 and is a new member of the deep q-network (DQN) family of algorithms. The attacker has full observability of the environment and access to a causative attack that uses state manipulation in an attempt to poison the learning process. The implementation of the attack is done under a white-box setting, in which the attacker has access to the defender's model and experiences. Two games are played; in the first game, DDQN is a defender and N2D is an attacker, and in second game, the roles are reversed. The games are played twice; first, without an active causative attack and secondly, with an active causative attack. For execution, three sets of game results are recorded in which a single set consists of 10 game runs. The before and after results are then compared in order to see if there was actually an improvement or degradation. The results show that with minute parameter changes made to the algorithms, there was growth in the attacker's role, since it is able to win games. Implementation of the adversarial learning by the introduction of the causative attack showed the algorithms are still able to defend the network according to their strengths.


Building Resilient SMEs: Harnessing Large Language Models for Cyber Security in Australia

Kereopa-Yorke, Benjamin

arXiv.org Artificial Intelligence

The escalating digitalisation of our lives and enterprises has led to a parallel growth in the complexity and frequency of cyber-attacks. Small and medium-sized enterprises (SMEs), particularly in Australia, are experiencing increased vulnerability to cyber threats, posing a significant challenge to the nation's cyber security landscape. Embracing transformative technologies such as Artificial Intelligence (AI), Machine Learning (ML) and Large Language Models (LLMs) can potentially strengthen cyber security policies for Australian SMEs. However, their practical application, advantages, and limitations remain underexplored, with prior research mainly focusing on large corporations. This study aims to address this gap by providing a comprehensive understanding of the potential role of LLMs in enhancing cyber security policies for Australian SMEs. Employing a mixed-methods study design, this research includes a literature review, qualitative analysis of SME case studies, and a quantitative assessment of LLM performance metrics in cyber security applications. The findings highlight the promising potential of LLMs across various performance criteria, including relevance, accuracy, and applicability, though gaps remain in areas such as completeness and clarity. The study underlines the importance of integrating human expertise with LLM technology and refining model development to address these limitations. By proposing a robust conceptual framework guiding the effective adoption of LLMs, this research aims to contribute to a safer and more resilient cyber environment for Australian SMEs, enabling sustainable growth and competitiveness in the digital era.